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Abstract 

We  consider  the  problem  of  network  formation  in  a  distributed  fashion.  Network  formation  is  modeled  as  a 
strategic -form  game,  where  agents  represent  nodes  that  form  and  sever  unidirectional  links  with  other  nodes  and 
derive  utilities  from  these  links.  Furthermore,  agents  can  form  links  only  with  a  limited  set  of  neighbors.  Agents 
trade  off  the  benefit  from  links,  determined  by  a  distance-dependent  reward  function,  and  the  cost  of  maintaining 
links.  When  each  agent  acts  independently  trying  to  maximize  its  own  utility  function,  we  can  characterize  “stable” 
networks  through  the  notion  of  Nash  equilibrium.  In  fact,  the  introduced  reward  and  cost  functions  lead  to  Nash 
equilibria  (networks)  which  exhibit  several  desirable  properties  such  as  connectivity,  bounded-hop  diameter  and 
efficiency  (i.e.,  minimum  number  of  links).  Since  Nash  networks  may  not  necessarily  be  efficient,  we  also  explore 
the  possibility  of  “shaping”  the  set  of  Nash  networks  through  the  introduction  of  state-based  utility  functions. 
Such  utility  functions  may  represent  dynamic  phenomena  such  as  establishment  costs  (either  positive  or  negative). 
Finally,  we  show  how  Nash  networks  can  be  the  outcome  of  a  distributed  learning  process.  In  particular,  we  extend 
previous  learning  processes  to  so-called  “state-based”  weakly  acyclic  games  and  we  show  that  the  proposed  network 
formation  games  belong  to  this  class  of  games. 


I.  Introduction 

Recent  advances  in  ad-hoc  network  technologies  demand  the  development  of  efficient  overlay  routing 
or  network  formation  protocols  over  complex  physical  network  structures,  such  as  internet,  cellular  and 
wireless  networks.  The  objective  of  such  overlay  routing  schemes  is  to  achieve  certain  routing  properties, 
for  example,  small  network  diameter,  small  congestion  and  minimum  communication  cost,  without  the 
need  to  standardize  or  deploy  a  new  routing  protocol  [2],  The  advantage  of  overlay  routing  in  such 
complex  infrastructures  can  be  significant,  e.g.,  to  divert  congested  traffic  in  cellular  netwroks  [3],  Other 
scenarios  where  overlay  routing  can  be  advantageous  are,  for  example,  peer-to-peer  hie  transfering  and 
end-host  multicasting  [4], 

The  approaches  that  have  been  proposed  for  overlay  routing  include  mostly  centralized  optimization 
schemes,  where  the  information  needed  for  each  node  to  calculate  an  optimal  routing  path  may  involve  the 
collection  of  information  from  the  whole  network  (see,  e.g.,  [4],  [3],  [5]).  However,  centralized  schemes 
may  suffer  from  several  issues  related  to  energy  conservation,  information  and  computational  complexity. 
Thus,  recent  trends  in  wireless  networks  technology  (not  necessarily  restricted  to  overlay  routing)  have 
focused  more  on  decentralized  schemes  [6],  when  information  and  computational  capacity  available  to 
each  node  are  limited. 

To  this  end,  more  recent  approaches  utilize  distributed  optimization  techniques  to  address  the  problem  of 
efficient  overlay  routing.  In  particular,  several  game-theoretic  approaches  have  been  considered,  where  each 
node  acts  independently  trying  to  maximize  its  own  utility  function  or  performance  measure.  The  definition 

This  work  was  supported  by  AFOSR  grants  #FA9550-09- 1-0420  and  #FA9550-10-l-0573.  An  earlier  version  of  part  of  this  paper 
appeared  in  [1], 

G.C.  Chasparis  is  with  the  Department  of  Automatic  Control,  Lund  University,  Lund  221  00-SE,  Sweden.  E-mail:  geor¬ 
gios. chasparis@control.lth.se.  URL:  http://www.control.lth.se/chasparis.  G.C.  Chasparis  is  also  supported  by  the  Swedish  Research  Council 
through  the  Linneaus  Center  LCCC. 

J.S.  Shamma  is  with  the  School  of  Electrical  and  Computer  Engineering,  Georgia  Institute  of  Technology,  Atlanta,  GA  30332.  E- 
mail:  shamma@gatech.edu.  URL:  http://www.prism.gatech.edu/~jshamma3. 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

2013 


2.  REPORT  TYPE 


4.  TITLE  AND  SUBTITLE 

Network  Formation:  Neighborhood  Structures,  Establishment  Costs,  and 
Distributed  Learning 

6.  AUTHOR(S) 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Georgia  Institute  of  Technology, School  of  Electrical  and  Computer 
Engineering, Atlanta, GA, 30332 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 


3.  DATES  COVERED 

00-00-2013  to  00-00-2013 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

to  appear  in  IEEE  Transactions  on  Cybernetics,  2013 

14.  ABSTRACT 

We  consider  the  problem  of  network  formation  in  a  distributed  fashion.  Network  formation  is  modeled  as 
a  strategic-form  game,  where  agents  represent  nodes  that  form  and  sever  unidirectional  links  with  other 
nodes  and  derive  utilities  from  these  links.  Furthermore,  agents  can  form  links  only  with  a  limited  set  of 
neighbors.  Agents  trade  off  the  benefit  from  links,  determined  by  a  distance-dependent  reward  function, 
and  the  cost  of  maintaining  links.  When  each  agent  acts  independently  trying  to  maximize  its  own  utility 
function,  we  can  characterize  ?stable?  networks  through  the  notion  of  Nash  equilibrium.  In  fact,  the 
introduced  reward  and  cost  functions  lead  to  Nash  equilibria  (networks)  which  exhibit  several  desirable 
properties  such  as  connectivity,  bounded-hop  diameter  and  efficiency  (i.e.,  minimum  number  of  links). 
Since  Nash  networks  may  not  necessarily  be  efficient,  we  also  explore  the  possibility  of  ?shaping?  the  set  of 
Nash  networks  through  the  introduction  of  state-based  utility  functions.  Such  utility  functions  may 
represent  dynamic  phenomena  such  as  establishment  costs  (either  positive  or  negative).  Finally,  we  show 
how  Nash  networks  can  be  the  outcome  of  a  distributed  learning  process.  In  particular,  we  extend  previous 
learning  processes  to  so-called  ?state-based?  weakly  acyclic  games  and  we  show  that  the  proposed  network 
formation  games  belong  to  this  class  of  games. 

15.  SUBJECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

18.  NUMBER 

19a.  NAME  OF 

ABSTRACT 

OF  PAGES 

RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Same  as 
Report  (SAR) 

20 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


2 


of  such  utilities  is  open-ended,  e.g.,  end-to-end  delays  in  the  node’s  connections  [7],  and  constitutes 
one  of  the  main  challenges  for  such  approaches.  The  main  question  emerging  is  whether  such  utility 
function  exists  that  (a)  assumes  minimum  available  information  to  each  node  (preferably  local),  and  (b) 
guarantees  the  establishment  of  desirable  network  configurations  when  each  node  maximizes  myopically 
its  own  performance  measure.  Since  robust  solutions  in  such  a  distributed  optimization  framework  can 
be  described  with  respect  to  Nash  equilibria ,  naturally,  a  secondary  question  emerging  is  whether  routing 
layouts  which  correspond  to  Nash  equilibria  exhibit  desirable  properties  (e.g.,  connectivity,  small  diameter 
or  small  number  of  links). 

This  paper  presents  a  distributed  optimization  approach  to  the  establishment  of  efficient  networks  for 
overlay  routing.  Motivated  by  the  current  research  on  social  networks  [8],  [9],  we  model  the  problem  as 
a  strategic-form  game.  In  this  framework,  each  agent  represents  a  node  and  decides  which  links  to  form 
with  its  neighboring  nodes  so  that  its  own  utility  is  maximized.  Our  goal  is  to  explore  how  decision  rules 
at  the  node  level  can  justify  the  emergence  of  various  network  configurations .  This  work  can  also  serve 
as  a  design  tool  for  network  formation,  e.g.,  for  overlay  routing  and  topology  control  in  ad-hoc  networks 
[10],  [11],  where  minimum  communication  cost  is  a  common  requirement. 

The  remainder  of  the  paper  is  organized  as  follows.  Section  II  discusses  related  work  on  the  subject  of 
efficient  network  formation  and  states  the  contributions  of  this  paper.  Section  III  presents  the  necessary 
terminology  and  introduces  the  framework  of  state -based  utility  functions.  Section  IV  presents  the  network 
formation  model  and  the  different  versions  of  reward/cost  functions.  Section  V  analyzes  the  set  of  Nash  and 
efficient  networks  of  the  network  formation  games.  Section  VI  presents  two  different  learning  processes 
and  analyzes  their  convergence  properties  when  applied  to  network  formation  games.  Section  VI  also 
presents  selected  illustrative  simulations.  Finally,  Section  VII  presents  concluding  remarks. 

II.  Related  Work 

Several  models  for  distributed  network  formation  have  been  proposed  that  are  based  on  game-theoretic 
formulations.  These  include  static  models ,  where  the  problem  of  network  formation  is  usually  modelled 
as  a  strategic-form  game  of  several  agents  (or  nodes),  where  agents’  actions  correspond  to  network  links. 
Such  approach  was  first  considered  by  [12]  where  agents  propose  links  sequentially  which  are  then 
accepted  or  rejected,  forming  an  extensive-form  game.  These  studies  characterize  networks  in  terms  of 
the  Nash  equilibria  of  the  associated  game,  called  Nash  networks.  An  improvement  over  such  model  was 
presented  in  [13],  where  agents  simultaneously  announce  their  preferences  for  links,  and  a  link  is  formed 
only  if  both  agents  agree.  One  of  the  issues  emerging  in  such  model  is  the  multiplicity  of  Nash  equilibria. 
Reference  [14]  further  discussed  the  relationship  of  the  emerging  equilibria  with  the  concept  of  pairwise 
stability.  Another  static  model,  which  is  closer  to  the  work  of  cooperative  games  [15],  was  presented  in 
[8],  where  agents  benefit  from  direct  or  indirect  connections  to  other  nodes  ( connections  model).  Reference 
[8]  discusses  one  of  the  main  issues  present  in  network  formation  games,  that  is,  the  discrepancy  between 
efficient  and  stable  networks.  Some  extensions  of  these  models  include  reference  [14],  which  deals  with 
the  problem  of  constructing  utility  functions  for  which  efficient  networks  are  pairwise  stable,  and  reference 
[16],  which  extends  [8]  to  the  case  of  directed  links. 

Although  static  models  capture  the  stability  properties  of  certain  networks  based  on  node  (agent) 
objectives,  they  do  not  capture  how  these  networks  emerge.  Such  questions  can  be  answered  by  designing 
dynamic  models  to  capture  how  agents  make  decisions  and  how  these  decisions  contribute  over  time  to 
network  formation.  For  example,  reference  [17]  models  network  formation  as  a  dynamic  process,  according 
to  which  at  each  time  instant,  a  pair  of  nodes  is  randomly  selected  and  a  link  is  added  between  them  if 
both  agents  benefit  from  it. 

One  of  the  main  issues  of  dynamic  formulations  is  the  fact  that  the  dynamic  process  may  cycle.  To 
avoid  such  cycles,  [18]  introduced  random  perturbations  to  the  formation  process.  Somewhat  parallel  to 
[18],  reference  [9]  develops  models  of  network  formation  that  use  tools  from  noncooperative  game  theory , 
according  to  which  agents  can  form  and  sever  links  unilaterally,  i.e.,  no  mutual  consent  is  needed  to  form 
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a  link  between  two  agents.  A  more  recent  work  [19]  has  extended  the  model  of  [9]  to  the  case  where 
agents  may  establish  links  only  with  a  subset  of  other  agents.  A  similar  extension  has  been  analyzed 
independently  in  both  the  current  and  an  earlier  version  of  this  paper  [1], 

A  central  implication  of  unilateral  link  formation  [9]  is  that  it  leads  to  the  concept  of  Nash  equilibrium. 
Naturally,  best-reply  dynamics  have  been  utilized  to  study  convergence  to  Nash  equilibria.  Such  dynamics 
have  also  been  employed  in  models  with  bilateral  link  formation  [20].  Under  such  dynamics,  agents  are 
usually  aware  of  all  other  agents’  actions  implying  large  information  and  computational  complexity.  To 
avoid  the  resulting  complexity,  several  dynamic  models  assume  that  agents  learn  how  to  play  the  game 
through  time  by  adaptively  reacting  to  measurements  of  their  own  performance  (utility).  A  few  models 
that  belong  to  this  category  include  [21],  [22],  where  agents  adaptively  form  and  sever  links  in  reaction 
to  an  evolving  network,  and  in  some  models,  their  decisions  are  subject  to  small  random  perturbations. 
The  rewards  received  from  each  agent  determine  which  interactions  will  be  reinforced,  and  the  network 
structure  emerges  as  a  consequence  of  the  agents’  learning  behavior. 

Recently,  distributed  network  formation  have  also  been  considered  as  a  way  for  distributed  overlay 
routing  over  complex  physical  network  structures,  e.g.,  internet,  cellular  and  wireless  networks.  The 
approaches  that  have  been  proposed  for  overlay  routing  include  mostly  centralized  optimization  schemes. 
For  example,  in  [4],  a  centralized  shortest-path  algorithm  is  used  to  find  overlay  links  that  satisfy  a  QoS 
requirement.  Centralized  information  is  also  necessary  in  [3]  for  overlay  routing  in  cellular  networks 
where  each  base  station  needs  to  exchange  information  with  all  the  available  sources.  Also,  in  [5],  a 
centralized  scheme  for  bandwidth-aware  routing  is  introduced,  where  each  node  periodically  measures 
bandwidth  capacity  to  every  node  in  the  network. 

Game-theoretic  approaches  have  also  been  considered  to  address  the  problem  of  overlay  routing  in  order 
to  avoid  issues  due  to  information  and  communication  complexity  of  centralized  schemes.  For  example, 
[23]  models  the  problem  of  overlay  routing  in  large-scale  content  sharing  applications  as  a  strategic-form 
game.  However,  the  resulting  game  may  not  exhibit  Nash  equilibria,  while  Nash  equilibria  (if  exist)  cannot 
be  explicitly  characterized.  A  noncooperative  game  formulation  for  overlay  routing  is  also  considered  in 
[7],  where  the  cost  function  of  each  node  accounts  for  the  end-to-end  delays  in  its  connections.  Reference 

[7]  characterizes  the  best-reply  strategy  for  each  node  and  computes  pure  Nash  equilibria  through  iterative 
best-response  search. 

The  above  game-theoretic  formulations  for  distributed  network  formation  or  overlay  routing  reveal 
some  of  the  main  issues  present  in  such  approaches,  that  is:  (a)  information  complexity  for  best-reply 
computation,  i.e.,  computation  of  best-reply  assumes  that  each  player  is  aware  of  the  previous  actions  of 
all  other  agents  (see,  e.g.,  [9],  [7]),  (b)  existence  and  characterization  of  Nash  networks,  i.e.,  showing 
that  Nash  networks  exist  and  characterizing  those  networks  may  not  be  possible  (see,  e.g.,  [23]),  (c) 
non-efficiency  of  Nash  networks  (see,  e.g.,  [9]),  (d)  distributed  convergence  to  desirable  Nash  networks, 
i.e.,  most  proposed  schemes  assume  best-reply  dynamics  for  convergence  to  Nash  networks  which  might 
be  infeasible  (see,  e.g.,  the  discounted  connections  model  in  [9]). 

This  paper  presents  a  novel  game-theoretic  approach  for  distributed  network  formation  that  addresses 
most  of  the  above  issues.  Our  approach  is  mostly  related  to  dynamic  and  evolutionary  models,  such  as 
[9],  [16],  [21],  [22].  In  particular,  we  consider  agents  that  have  freedom  over  establishing  or  severing 
unidirectional  links  with  neighboring  agents  based  on  myopic  considerations.  Unidirectional  links  model 
phenomena  such  as  web  links,  observations  of  others,  citations,  etc.  [24]. 

Specifically,  our  contributions  are  the  following:  (i)  We  discuss  the  case  where  nodes  can  form  links 
only  with  a  subset  of  the  other  nodes  (i.e.,  neighborhood  structures),  as  opposed  to  the  entire  network; 
(ii)  We  introduce  utility  functions  that  are  distance-dependent  variations  of  the  connections  model  of 

[8]  and  guarantee  that  Nash  networks  exist  and  exhibit  desirable  properties,  e.g.,  connectivity,  efficiency 
and  bounded-hop  diameter;  (iii)  We  introduce  state-dependent  utility  functions  that  can  model  dynamic 
phenomena  such  as  establishment  costs  and  can  be  used  as  an  equilibrium  selection  mechanism  in  favor  of 
efficient  Nash  networks;  (iv)  We  derive  a  learning  process  that  guarantees  convergence  to  Nash  equilibria 
for  the  state-based  extension  of  weakly  acyclic  games,  among  which  the  proposed  network  formation 
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games;  and  (v)  We  employ  payoff-based  dynamics  for  distributed  convergence  to  Nash  networks  based 
on  a  modified  reinforcement  learning  scheme  [25],  and  drop  the  typical  assumptions  that  nodes  have 
knowledge  of  the  full  network  structure  and  can  compute  optimal  link  decisions. 


III.  Terminology  &  Background 
A.  Games  with  state-based  utilities 

A  game  involves  a  finite  set  of  agents  (or  players),  X  =  (1,  2, ...,  n }.  Each  agent  i  has  a  finite  number 
of  actions,  denoted  by  A,  .  We  will  enumerate  the  actions  of  each  agent  i  and  let  a,  denote  the  index  of 
agent  V s  action.  The  total  number  of  actions  of  agent  i  is  denoted  by  \At\.  Define  A  to  be  the  Cartesian 
product  of  the  action  spaces  of  all  agents,  i.e.,  A  =  Ai  x  ...  x  An.  Let  also  a  =  (a1:  a2,  ■■■,  an)  E  A  be 
the  action  profile  of  all  agents. 

For  any  positive  integer  m,  A (m)  denotes  the  probability  simplex  in  Km,  i.e.,  A (m)  A  {x  E  : 
x  >  0,  lTx  =  1},  where  1  is  a  vector  of  ones  of  appropriate  size.  The  vectors  e3 ,  j  =  1,2, . . . ,  in,  denote 
the  vertices  of  A (m).  In  some  cases  and  by  abusing  notation,  we  will  identify  actions  by  vertices  of  the 
simplex  (instead  of  indices),  i.e.,  a*  =  e3  implies  that  agent  i  selected  action  j  e  At. 

In  strategic-form  games,  after  each  agent  i  E  X  has  selected  an  action  ay  E  Ai,  agents  are  assigned 
utilities,  i.e.,  evaluations  of  their  own  performance.  Usually,  such  utilities  are  represented  as  instances  of 
a  function  vl  :  A  K+,  called  utility  function,  that  takes  values  in  the  set  of  action  profiles.  In  other 
words,  the  utility  of  each  agent  i  depends,  in  general,  on  the  actions  of  all  players  (an,  a2, ...,  an)  E  A. 

In  order  to  also  incorporate  dynamic  phenomena  in  the  utilities  such  as  establishment  costs,  we  will 
assume  that  agents  measure  a  utility  or  payoff  that  depends  on  two  variables:  the  action  profile,  a,  and 
an  internal  agent-specific  state,  xt.  The  definition  of  this  state  variable  is  open  ended  (cf.,  “state-based” 
utility  functions  in  [26]  and  “sometimes  weakly  acyclic”  games  in  [27]).  In  the  present  setting,  we  restrict 
Xi  to  the  simplex  A  ( |  A,  | )  and  interpret  the  jth  entry  of  xr,  Xij  E  [0, 1],  as  the  “familiarity”  weighting  of 
agent  i  with  action  j  E  A,.  Since  xt  is  in  the  simplex,  these  relative  weights  sum  to  one. 

We  now  define  state-based  utility  functions  as  follows. 

Definition  3.1  (State-based  utility  function):  A  state -based  utility  function  maps  zy  :  Ax  A(|A,| )  — >  M+ 
with  Vi(a,Xi)  being  the  payoff  of  agent  i  at  joint  action  a  and  at  (familiarity)  state  ay. 

The  expression  zy  (a,  a*)  is  taken  to  mean  zy(a,  xf  evaluated  at  ay  =  ay.  In  that  case,  the  familiarity 
vector  Xi  corresponds  to  a  unit  vector,  say  Cj,  i.e.,  the  “familiarity”  weighting  of  agent  i  with  action  j  is 
1,  while  the  corresponding  weighting  with  other  actions  is  0. 

In  several  cases,  we  will  need  to  compare  joint  action  profiles  based  on  efficiency.  To  this  end,  we 
define  efficient  action  profiles  as  follows: 

Definition  3.2  ( Efficient  action  profile):  Define  the  value  function  V(a )  A  zy(a,  ay).  An  efficient 
action  profile  is  an  action  profile  a*  E  A  such  that  V(a*)  =  maxae_4  T(a). 

Let  —i  denote  the  complementary  set  X\{/'}.  We  will  often  split  the  argument  of  a  function  in  this  way, 
e.g.,  F(a)  =  F(cti,  a_y).  The  following  extends  the  notion  of  better  reply  to  state -based  utility  functions. 

Definition  3.3  (Better  reply):  The  better  reply  set  of  agent  i  E  I  to  an  action  profile  a  =  (ay,  a_y)  G  A 
is  a  function  BR,  :  A  A,  such  that  for  any  E  BR, (a)  we  have 

Vi((a*,  a-i),  ay)  >  zy((aj,  a_*),  a*).  (1) 


Note  that  better  reply  is  a  set-valued  function  and  might  be  empty.  Furthermore,  when  we  evaluate  the 
better  reply  set  of  an  agent  i  to  an  action  profile  a  =  (a,, the  underlying  familiarity  state  in  the 
agent’s  state-based  utility  is  assumed  to  be  the  corresponding  action  of  that  agent,  ay. 

Based  on  this  definition  of  better  reply,  we  introduce  the  notion  of  a  “stable”  action  profile  by  extending 
the  definition  of  a  Nash  equilibrium  to  state -based  utility  functions. 

Definition  3.4  (State-Based  Nash  equilibrium):  An  action  profile  a *  is  a  (state -based)  Nash  equilibrium 
if  BR?:(a*)  =  0  for  every  i  E  X,  i.e., 


Vi((a*,  a*,),  a*)  >  zy((a-,  a* y),  a*), 


(2) 
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for  all  a[  G  A,;\{«*}  and  i  G  I.  Likewise,  a  strict  (state -based)  Nash  equilibrium  satisfies  the  strict 
inequality  in  (2). 

For  the  remainder  of  the  paper,  we  will  refer  to  a  state-based  Nash  equilibrium  as  a  Nash  equilibrium. 


B.  Coordination  games 

In  this  section,  we  introduce  sufficient  conditions  for  the  existence  of  state-based  Nash  equilibria.  To 
this  end,  it  is  necessary  to  first  introduce  the  notion  of  an  improvement  step  defined  as  follows: 

Definition  3.5  ( Improvement  step):  The  improvement  step  function  of  agent  i  G  X  to  an  action  profile 
a  =  (a*,  a_j)  G  A  is  a  function  IS*  :  A  — >  A *  such  that  for  any  a*  G  IS* (a)  the  following  two  conditions 
are  satisfied: 

1)  Vi((a*,a-i),ai)  >  vf(ai,  «_*),  a*), 

2)  v;((a*,  «_*),«*)  > 

Note  that  if  a*  is  an  improvement  step  for  agent  i  to  the  action  profile  a,  i.e.,  a*  G  IS,  (a),  then  a* 
is  also  a  better  reply  to  a.  Accordingly,  if  a  is  a  Nash  equilibrium  profile,  i.e.,  BRj(o:)  =  0  for  all 
j  G  I,  then  we  also  have  ISj(a)  =  0.  The  converse  is  not  necessarily  true,  i.e.,  there  might  not  be  an 
improvement  step  from  an  “unstable”  action  profile. 

We  utilize  the  notion  of  an  improvement  step  to  derive  sufficient  conditions  for  the  existence  of  Nash 
equilibria.  To  this  end,  we  introduce  the  following  class  of  games. 

Definition  3.6  ( Coordination  game):  A  coordination  game  is  such  that  there  exists  a  function  0 :  A  — > 
R  with  the  following  property:  for  any  action  profile  a  =  {a\ , ....  an)  G  A  other  than  a  Nash  equilibrium, 
there  exist  an  agent  i  G  J  such  that  IS* (a)  f  0,  and  an  action  ce'  G  IS*  (a)  such  that 

4>(a'i,  a-i)  >  <})(ai,  a-i). 

We  will  refer  to  this  property  as  coordination  property  and  the  function  0  as  coordination  function. 

In  words,  a  coordination  game  is  such  that  at  any  action  profile  other  than  a  Nash  equilibrium,  there 
exist  an  agent  and  an  action  which  can  improve  both  its  own  payoff  and  the  value  of  the  coordination 
function  0.  Such  a  feature  introduces  a  weak  form  of  ordinal  potential  games  (cf.  [28]). 

Such  a  feature  is  also  related  to  a  form  of  “ coincidence  of  interests .”  Note  that  prior  definitions  of 
coordination  games,  e.g.,  by  [29],  [30],  assume  stronger  conditions.  For  example,  in  [30],  a  coordination 
game  is  defined  such  that  any  better  reply  (for  non-state -based  utilities)  makes  no  other  agent  worse  off. 

The  following  is  a  straightforward  consequence  of  the  definition  of  coordination  games. 

Claim  3.1:  For  coordination  games,  any  action  profile  which  maximizes  the  coordination  function  0  is 
a  Nash  equilibrium. 

This  claim  can  be  used  to  relate  efficient  action  profiles  with  Nash  equilibria  when  0  corresponds  to 
the  value  function. 

Claim  3.2:  For  any  coordination  game  such  that  the  coordination  function  0  corresponds  to  the  value 
function,  i.e.,  <p  —  V,  any  efficient  action  profile  is  a  Nash  equilibrium. 

Another  straightforward  consequence  of  the  definition  of  a  coordination  game  is  the  following. 

Claim  3.3:  For  coordination  games,  starting  from  any  action  profile  a  G  A,  there  exists  a  finite  sequence 
of  action  profiles  (a0,  a1, ...,  am},  such  that  a0  =  a,  of)  G  ISi(afc_1)  for  some  i,  and  am  is  a  Nash 
equilibrium. 

Setting  aside  the  state-based  aspect,  this  claim  indicates  that  coordination  games  defined  here  resemble 
weakly  acyclic  games  (cf.,  [31]). 

The  coordination  property  introduced  here  will  be  particularly  useful  in  (a)  showing  existence  of  Nash 
equilibria  (due  to  Claim  3.3)  in  network  formation  games,  and  (b)  designing  distributed  learning  schemes 
for  convergence  to  Nash  equilibria,  as  we  will  discuss  in  the  forthcoming  sections. 
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IV.  The  Network  Formation  Model 

A.  One-way  benefit  flow 

We  will  consider  a  one-way  (directed)  benefit  flow  model,  where  a  directed  network  G  consists  of  the 
non-empty  finite  set  of  agents  (or  nodes )  1  and  a  finite  set  of  pairwise  directed  links  (or  edges )  E. 

A  directed  link  from  node  i  to  node  j,  j  f  i,  is  denoted  which  represents  flow  of  bene¬ 

fits/information  from  i  to  j.  A  path  from  i  to  j  in  G,  (i  — >  j)  is  a  sequence  of  distinct  nodes  that 
starts  at  i  and  ends  at  j,  i.e.,  (i  — >  j )  —  {i  —  s0,si,  ...,sm  =  j}  for  some  positive  integer  m,  such  that 
(sk,Sk+ 1)  G  £  for  0  <  k  <  m  —  1.  The  number  of  links  in  the  path  (i  — >  j)  is  denoted  (i  — >  j)\.  For 
nodes  i  and  j  in  G,  the  distance  from  i  to  j,  denoted  distc(i,  j),  is  the  minimum  length  of  a  path  (i  — >  j), 
if  j  is  reachable  from  i,  i.e.,  distc(i,  j)  =  min^-^cG  |(*  j)  !•  If  there  is  no  path  from  i  to  j  in  G,  then 
by  convention  dist c(i,  j)  =  oo.  Also  distc(i,i)  =  0  for  every  node  iel 

Definition  4.1  (Connectivity):  A  network  G  is  connected  if  for  all  i  f  j,  (i  — >  j)  C  G. 

Two  useful  subclasses  of  connected  networks  are  critically  connected  networks  and  minimally  connected 
networks. 

Definition  4.2  (Critical  connectivity):  A  network  G  is  critically  connected  if  (i)  it  is  connected  and  (ii) 
if  (i,j)  G  G,  then  the  unique  path  (i  — >  j )  is 

In  words,  a  critically  connected  network  is  such  that  if  the  link  from  agent  i  to  (neighboring)  agent  j 
is  dropped,  then  there  is  no  path  (i  — >  j)  in  the  network. 

Definition  4.3  (Minimal  connectivity):  A  network  G  is  minimally  connected  if  (i)  it  is  connected  and 
(ii)  it  has  the  minimum  number  of  links. 

Note  that  a  minimally  connected  network  will  also  be  critically  connected.  The  converse  is  not  neces¬ 
sarily  true. 

B.  Action  spaces  and  neighborhood  structures 

We  assume  that  each  agent  i  may  establish  links  only  with  its  neighbors,  denoted  by  TV)  with  cardinality 
\J\fi\.  In  the  unconstrained  neighbors  case,  Mi  =  T\{t).  For  the  remainder  of  the  paper ,  we  will  assume 
that  the  neighborhoods  are  such  that  connectivity  is  cdways  feasible. 

The  set  of  actions  of  agent  i,  denoted  A,,  contains  all  possible  combinations  of  neighbors  with  which 
a  link  can  be  established  including  the  case  of  establishing  no  links,  i.e.,  A,  =  2^ .  The  notation  at  \ 
denotes  the  cardinality  of  a  £  2A/L 

By  abuse  of  notation,  we  will  use  a*  e  A,  to  refer  to  either  an  element  of  A,  =  2jV'1 ,  or  an  index  over 
A,;.  Likewise,  we  will  use  a  to  denote  the  network,  G,  induced  by  the  collective  actions  a  £  A,  and  so 
we  may  write  expressions  such  as  dista(i,  j)  rather  than  dist g(lj)- 

C.  Reward  and  cost  functions 

The  state-based  utility  function  of  agent  i  is  a  function  of  the  form  Vi  :  A  x  A(|^4j|)  — >  M+,  where 

Vi(a,Xi)  =  Rfia)  -  Cfia^Xi). 

The  function  R%  :  A  R_  is  the  reward  of  agent  i,  and  the  function  C,  :  A,  x  A ( |  A,  | )  — >  R_  is  the 
cost  of  the  action  of  agent  i.  We  will  consider  several  forms  of  the  reward  and  cost  function  defined  in 
the  following  sections  specifically  tailored  for  network  formation. 

1)  Connections  reward  model:  Assume  that  each  individual  is  a  source  of  benefits  that  others  can 
access  via  the  formation  of  links.  In  particular,  a  link  with  another  agent  inherits  the  benefits  available  to 
that  agent  via  its  own  links.  Following  [8],  define  the  connections  reward  function: 

Ri(a)  -  Aa(s^  i) 

s£l\{i} 


(3) 
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where 

.  .\  a  f  5dlst“(s’A  dista(s, i)  <  oo: 

Xa{s  -»•  i)  =  <  f  .v 

I  l),  dista(s,  i)  =  oo 

for  some  6  e  (0, 1].  We  will  refer  to  the  case  of  5  =  1  as  the  frictionless  benefit  flow  and  to  the  case  of 
5  <  1  as  the  decaying  benefit  flow. 

This  reward  function  has  also  been  used  in  a  game-theoretic  formulation  for  topology  control  in  wireless 
ad-hoc  networks  by  [32].  However,  in  [32],  the  action  space  is  the  neighborhood  range  and  every  two 
nodes  within  the  same  range  are  always  connected  with  a  bidirectional  link.  Instead,  here  links  are  assumed 
directional  to  reduce  communication  and  agents  have  the  ability  to  choose  over  which  links  to  establish 
assuming  a  given  neighborhood  layout.  In  [8],  [9],  [17]  the  same  reward  function  has  been  considered 
but  without  the  neighborhood  constraints  imposed  on  the  action  space. 

Reference  [9]  also  considers  the  possibility  of  decaying  benefits.  As  we  shall  see  later,  such  a  reward 
function  can  establish  an  upper  bound  in  the  distance  among  any  two  neighboring  nodes  at  any  Nash 
equilibrium.  However,  for  the  case  of  5  <  1,  existence  of  Nash  equilibria  is  not  guaranteed.  This  is  a 
main  reason  of  introducing  the  forthcoming  limited  connections  reward  model. 

2)  Limited  connections  reward  model:  We  consider  here  an  alternative  reward  function  that  keeps  track 
only  of  those  neighbors  which  are  at  most  AT-hops  away,  where  K  e  N.  Define  the  benefit  function  of 
agent  i  e  1  as 


seAfi 

where 

K(  _  )  =  J lj  dist«(M)  <  K\ 

|0,  distQ(s,  i)  >  K. 

In  other  words,  the  benefit  function  of  agent  i  counts  the  number  of  neighbors  within  distance  K.  One  can 
think  of  neighbors  in  this  setting  as  favorite  agents.  Although  an  agent  can  access  its  favorites  directly, 
doing  so  incurs  both  an  establishment  and  maintenance  cost  (forthcoming).  Therefore,  an  agent  may  seek 
to  gain  indirect  access  to  its  favorites. 

Unfortunately,  the  above  utility  function  does  not  necessarily  define  a  coordination  game  for  a  generic 
neighborhood  structure.  This  implies  that  Nash  equilibria  may  not  exist.  To  resolve  this  issue,  we  modify 
the  reward  to  include  “downstream”  effects  of  an  agent’s  actions. 

For  agent  i  el,  define  the  set  of  nodes  that  are  accessing  agent  i  as: 

Bfla;  K)  =  {j  e  T  :  (i  — >  j)  G  a  and  dista(i,  j)  <  K  —  1}. 


This  set  describes  the  downstream  beneficiaries  of  links  made  by  agent  i.  Define  the  downstream  deficiency 
of  agent  i  as: 


di(a)  =  Y] 

jeBi(a;K ) 


1 

0 


flj(a)  <  | Nj\ ; 

otherwise. 


An  agent  at  full  benefit  capacity  has  / 3j(a )  =  \N3  \ ;  otherwise,  the  agent  has  a  deficiency.  The  function 
di(a)  counts  the  number  of  downstream  beneficiaries  that  are  deficient. 

We  now  define  the  limited  connections  reward  model  as  follows.  For  0  <  7  <  1,  define 

ft(a)4A(a)(l-7T|W_).  (5) 

In  words,  this  function  rewards  connections  to  neighbors,  but  at  a  reduced  rate  because  of  possible 
downstream  deficiencies.  In  case  of  no  such  deficiencies,  i.e.,  dfla)  =  0,  the  reward  equals  the  benefit 


function. 

3)  A  state-based  cost:  The  cost  function  of  agent  i,  Ci  :  Ai  x  A(|A;|)  — >  R+,  is  defined  as: 

Xi)  =  k0  | oil |  +  /ci00a:0T (1  -  00x0),  (6) 

with  k0  >  0,  K\  G  M,  and  0*  :  A(|,40)  — >  defined  by  [00x00-  A  XnaeAdea} x™-  (RecaU  that  we  are 
using  a  as  both  an  index,  as  in  xia,  and  a  set,  as  in  j  G  a  G  2A/'1.)  In  words,  [00x00-  denotes  the  probability 
that  agent  i  will  form  a  link  to  neighbor  j  based  on  the  distribution  x,.  The  term  (1  —  00x0)T00a:0 
grows  with  misalignment  of  the  action  ot  with  the  distribution  xt.  In  the  perfectly  aligned  case,  for  any 
at  G  Ai  (viewed  as  a  vertex  of  A(|^4,  |)), 

00«0T(  1  -  00q:0)  =  0 

whereas  in  the  worst  case, 

max00o:0T(l  -  00x0)  =  |o0  . 

Xi 

The  first  part  of  the  cost  function  (6)  corresponds  to  the  cost  of  maintaining  the  currently  established 
links.  The  state  Xi  reflects  familiarity  with  a  particular  set  of  links.  Accordingly,  the  second  part  corresponds 
to  an  establishment  cost.  The  establishment  cost  models  possible  inertia  of  the  system.  When  >  0, 
this  term  represents  the  effort  necessary  to  establish  a  new  link,  whereas  in  the  case  K\  <  0,  it  represents 
incentives  to  explore. 


V.  Nash  Equilibrium  Networks 
A.  Existence,  Connectivity,  and  Efficiency 

We  begin  by  establishing  existence  of  Nash  equilibria  by  deriving  sufficient  conditions  for  the  network 
formation  games  defined  here  to  be  coordination  games.  For  the  remainder  of  this  paper,  we  will  use  the 
following  shorthand  notation  to  represent  the  specific  network  formation  game  under  discussion: 

—  Ci:  The  connections  reward  function  (3)  and  state -based  cost  function  (6). 

—  £:  The  limited  connections  reward  function  (5)  and  state -based  cost  function  (6). 

Proposition  5.1  (Coordination  property):  Let  >  0. 

1)  Ci  is  a  coordination  game  for  5  =  1  and  k0  +  Kq  <  1. 

2)  £  is  a  coordination  game  for 


Kq  +  Kl  <  1  —  7  & 


K0  < 


m 

m  - 1 
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for  all  i  G  1. 


Proof:  See  Appendix.  ■ 

In  other  words,  Proposition  5.1  derives  conditions  for  the  cost  parameters  kq  and  n\  under  which 
the  resulting  network  formation  game  is  a  coordination  game.  Due  to  this  property  and  Claim  3.3  the 
resulting  network  formation  games  admit  Nash  equilibria.  More  specifically,  the  following  properties  are 
direct  consequence  of  Proposition  5.1  and  its  proof. 

Proposition  5.2  (Nash  equilibrium  connectivity):  Under  the  hypotheses  of  Proposition  5.1, 

1)  Both  Ci  and  £  admit  Nash  equilibria. 

2)  If  a*  is  a  Nash  equilibrium  in  £,  then  a*  is  connected. 

3)  If  a*  is  a  Nash  equilibrium  in  £,  then  distQ*  (j,  i)  <  K,  for  all  i  G  X  and  j  G  A/0 

We  comment  that  condition  2  remains  true  in  the  Ci  framework  for  k0  +  M  <  5  <  1.  However,  in  the 
case  of  decaying  benefit  flow  (i.e.,  5  <  1),  the  existence  of  a  Nash  equilibrium  is  not  guaranteed  under 
neighborhood  structures. 

We  will  refer  to  Nash  equilibria  of  these  network  games  as  Nash  networks.  Note  that  because  of  the 
state -based  utility  functions,  these  Nash  networks  need  not  coincide  with  Nash  networks  from  prior  studies. 

Finally,  the  following  proposition  relates  efficiency  with  Nash  equilibria. 
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Proposition  5.3  (Nash  network  efficiency):  Under  the  hypotheses  of  Proposition  5.1,  for  both  £  and  12, 
efficient  networks  are  Nash  networks  with  a  minimum  number  of  links. 

Proof:  As  we  showed  in  the  proof  of  Proposition  5.1,  both  £  and  12  satisfy  the  coordination  property, 
where  the  coordination  function  f  is  defined  as  the  network  value  (15).  Due  to  the  coordination  property 
and  Claim  3.2,  efficient  action  profiles  are  Nash  networks.  In  the  case  of  £,  the  value  at  a  Nash  network, 
a *,  is  V(a*)  =  n(n  —  1)  —  n0  Yhiex  \cx*\ ,  due  to  the  connectivity  property  (Proposition  5.2).  Likewise,  in 
the  case  of  12,  the  value  of  the  network  at  a  Nash  network,  oi  ,  is  U(cr  )  —  \Ni\  Siej  I ai  I  • 
either  case,  the  value  is  maximized  at  a  Nash  network  with  minimum  number  of  links.  ■ 

Propositions  5. 2-5. 3  address  one  of  the  main  issues  related  to  designing  network  formation  games,  that 
is,  (a)  showing  existence  of  Nash  equilibria ,  and  (b)  showing  efficiency  of  Nash  equilibria.  In  particular, 
the  introduced  notion  of  coordination  game  provides  a  test  criterion  for  existence  of  Nash  equilibria  in 
network  formation  games.  Furthermore,  due  to  the  coordination  property  of  the  designed  utility  functions 
in  £  and  12,  the  efficient  networks  (under  the  hypotheses  of  Proposition  5.1)  are  also  Nash  equilibria. 

B.  Special  case:  K\  =  0 

Nash  networks  have  a  special  structure  in  case  =  0. 

Proposition  5.4  ( £  Nash  networks  for  S  =  1,  k  ,  =  0):  Under  the  hypotheses  of  Proposition  5.1  and  for 
Ki  =  0,  a  network  in  £  is  a  Nash  network  if  and  only  if  it  is  critically  connected. 

Proof:  See  Appendix.  ■ 

For  the  connections  model  £  with  <5  =  1  and  K\  =  0,  the  Nash  networks  for  n  —  3  agents  are  shown 
in  Fig.  1.  Both  networks  are  critically  connected. 


i  i 


Fig.  1.  Two  Nash  networks  for  n  =  3  agents  in  £  with  5=1  and  k\  =  0. 

In  other  words.  Proposition  5.4  revealed  that  Nash  networks  in  £,  when  <5  =  1  and  n\  =  0,  are  networks 
which  are  not  only  connected,  but  also  critically  connected,  i.e.,  there  exists  at  most  one  direct  link  between 
any  two  nodes.  Such  property  indirectly  implies  that  the  number  of  links  for  each  node  at  a  Nash  network 
is  limited. 

Proposition  5.4  (which  was  first  derived  in  an  earlier  version  of  this  paper  [1]),  extends  [9,  Proposi¬ 
tion  3.1],  according  to  which  Nash  networks  are  critically  connected  under  unconstrained  neighbors. 

An  appropriate  generalization  of  a  critically  connected  network  is  also  a  Nash  network  in  the  £  frame¬ 
work.  Define  a  K -critically  connected  network  to  be  a  critically  connected  network  with  the  additional 
property  that  distG(j,  i)  <  K  for  all  i,  j  el  and  j  e  Aft. 

Proposition  5.5  (£  Nash  networks  for  K\  =  0):  Under  the  hypotheses  of  Proposition  5.1  and  for  K\  = 
0,  a  network  in  £  is  a  Nash  network  if  it  is  A'-critically  connected. 

Proof:  Following  the  proof  of  Proposition  5.4,  let  a*  be  a  A'-critically  connected  network,  and  let  of 
be  a  better  reply.  From  the  proof  of  Proposition  5.1,  we  can  assume  that  oi[  maintains  a  radius  of  K  for 
all  of  A).  Furthermore,  the  assumption  on  k0  implies  that  of  does  not  induce  any  downstream  deficiencies 
in  II, (a*:  K).  Therefore,  as  in  the  proof  of  Proposition  5.4,  |Adrop|  —  AA,]  >  0,  and  so  one  can  apply 
the  same  arguments.  ■ 

Note  that  the  reverse  implication  may  not  hold. 

In  the  £  framework  with  decaying  benefits  (3  <  1),  the  Nash  equilibrium  condition  imposes  a  structural 
constraint  on  the  distances  between  nodes. 

Proposition  5.6  (£  Nash  networks  for  5  <  1,  —  0):  For  £  with  5  <  1,  0  <  k0  <  5,  and  =  0,  let 

a*  be  a  Nash  network  corresponding  to  the  joint  action  a*  G  A.  For  any  agent  i,  if  a*  <  |Ay|.  then 

<)dista*(U)  >5- n0  for  all  j  e  K- 


(7) 
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Proof:  Let  a*  satisfy  the  assumptions  of  Proposition  5.6,  and  compare  an  alternative  action  a[  G  A, 
that  consists  of  adding  a  direct  link  to  neighbor  j,  i.e.,  a '  =  a*  U  {j}.  The  resulting  utility  to  agent  i  can 
be  bounded  by 


u((«',  «V),  a*)  >  «<((«?,  a!*),  a?)  +  (5  -  S***™)  -  k0. 


That  is,  the  consequence  of  adding  a  link  to  j  shortens  the  distances  to  other  links;  adds  the  direct  benefit 
of  a  link  to  j;  loses  the  indirect  benefit  of  a  link  to  j;  and  incurs  additional  maintenance  cost.  Therefore, 
if  (5  -  <$dkta.(j\i))  —  kq  >  0,  then  there  is  an  incentive  to  add  a  link  to  j,  and  so  a*  cannot  be  a  Nash 
network.  Conversely,  asserting  that  a*  is  a  Nash  network  implies  the  desired  result.  ■ 

The  condition  \a*\  <  \Af,t\  means  that  agent  i  is  not  using  all  of  its  available  links.  The  inequality  (7)  is 
revealing  only  for  neighbors  of  i  for  which  there  is  no  direct  link.  This  could  be  of  interest,  for  example, 
in  the  unconstrained  neighbors  case  with  a  large  number  of  agents. 

This  theorem  can  be  used  to  bound  distances  to  neighbors  as  follows.  Inequality  (7)  is  equivalent  to 


dist < 


log(£  -  Kg) 
log(5) 


=  d. 


A  sufficient  condition  to  bound  the  distance  to  neighbors  by  d  is  then  k(,  <  8  —  8d. 

For  example,  consider  8  and  kq  such  that  8  —  82  <  kq  <  8  —  83.  According  to  Proposition  5.6,  this 
condition  implies  that  in  any  Nash  network  the  maximum  distance  that  can  be  supported  is  d  =  2.  Under 
these  conditions,  Fig.  2  shows  two  Nash  networks.  It  is  straightforward  to  show  that  both  networks  in 
Fig.  2  are  also  Nash  networks  for  the  £  framework  when  K  =  2  and  for  unconstrained  neighbors. 


i  i 


Fig.  2.  Two  Nash  networks  for  n  =  4  agents  under  £  with  5  —  S2  <  Ko  <  S  —  83  and  m  =  0. 


Note,  finally,  that  Proposition  5.6  does  not  address  whether  or  not  Nash  equilibria  exist  in  £  for  8  <  1. 


C.  Strict  Nash  networks  and  small  K  \ 

A  forthcoming  section  deals  with  a  distributed  learning  process  based  on  reinforcement  learning.  It 
turns  out  that  under  certain  conditions,  this  process  can  converge  to  strict  Nash  equilibria,  but  not  to 
action  profiles  which  are  not  Nash  equilibria.  The  following  propositions  relate  strict  Nash  equlibria  for 
small  to  Nash  equilibria  for  K\  =  0. 

We  start  with  considering  positive  establishment  cost. 

Proposition  5.7  (Nash  networks  for  small  n  {  >  0):  Under  the  hypotheses  of  Proposition  5.1,  for  both 
£  and  £,  there  exists  R\  >  0  such  that: 

1)  If  a  is  not  a  Nash  network  for  k,i  =  0,  then  a  is  not  a  Nash  network  for  Ki  G  (0,  Ri); 

2)  If  a  is  a  Nash  network  for  /sq  =  0,  then  cc  is  a  strict  Nash  network  for  K\  G  (0,  fiq). 

Proof: 

Part  1:  Suppose  a  is  not  a  Nash  network  for  Kt  =  0.  Then  there  exists  a  better  reply,  a[  f  ar  such  that 

Ri(a')  —  k,0  \a[\  >  Rf  a)  —  k0  \a^\ , 

where  a'  =  (a-,  «_*).  This  a[  remains  a  better  reply  for  non-zero  Ki  as  long  as 

Ri(a')  —  k0  |a4|  —  Ki,0(q:')t(1  —  ^(cfj))  >  Rfa)  —  k0  |ot|  . 
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Define 


A 


Kl 


mu 

iSX 

ac,aif  £A. 


-^(qQ  +  ft0(|Qt|  -  |q-|)  -  -Rj(g) 

V’KFt1  -  V’K)) 


subject  to  a  not  a  Nash  network  and  a[  €  BRj(a).  This  minimization  involves  strictly  positive  values 
over  a  finite  set.  Therefore,  the  minimum  is  also  strictly  positive. 

Part  2:  Suppose  a  is  a  Nash  network  for  K\  =  0.  Then  for  all  i  e  1  and  at  e  At, 


Ri(a )  -  n0\ai\  >  Riia't,  «_*)  -  k0  \a[ 


Therefore,  for  positive  «q, 

Ri(a )  -  kq  \ati\  >  Ri(a[,  a_*)  -  k0  \a[\  -  ki^(q;-)t(1  -  ^(a*)). 


The  question  is  whether  the  above  inequality  is  strict.  Recall  the  distinct  sets  N^eep,  A^r0p,  and  iVadd 
defined  in  (16).  Clearly  if  7Vadd  f  0,  the  above  inequality  is  strict.  Now  suppose  that  iVadd  =  0  while 
N drop  0  and 

Ri{(R)  Ko  ( |  ^keep  T  ^Cdrop  |  )  Rj  ( Oi  )  Kq  |iV^eep|  . 

This  equality  means  that  a'  is  also  a  Nash  network  for  «q  =  0,  but  with  a[  fewer  links  than  az.  This 
conclusion  violates  the  derived  connectivity  properties  of  Nash  networks.  ■ 

In  other  words.  Proposition  5.7  states  that  when  we  increase  «q  from  zero  to  a  positive  value,  the  set  of 
Nash  networks  remains  identical  with  the  case  of  k1  =  0,  however,  all  Nash  networks  become  strict.  This 
observation  has  several  implications  when  we  discuss  distributed  learning  processes  in  network  formation 
games,  since  strict  Nash  networks  are  potential  attractors  of  the  learning  process,  while  non-strict  Nash 
networks  may  not  be.  Thus,  by  increasing  «q,  we  are  able  to  shape  the  set  of  strict  Nash  networks  to  all 
critically  connected  networks  (due  to  Proposition  5.4). 

The  case  of  negative  establishment  cost,  i.e.,  k  \  <  0,  can  be  viewed  as  rewarding  exploration.  The 
consequences  are  as  follows. 

Proposition  5.8  (Nash  networks  for  small  «q  <  0):  Assume  the  hypotheses  of  Proposition  5.1  with /sq  = 
0.  For  both  £  and  £,  there  exists  as,  <  0  such  that: 

1)  If  a  is  not  a  Nash  network  for  «q  =  0,  then  a  is  not  a  Nash  network  for  «q  e  (/sq,  0); 

2)  If  a  is  a  non-strict  Nash  network  for  «q  =  0,  then  a  is  not  a  Nash  network  for  «q  e  (/cx,  0); 

3)  If  a  is  a  strict  Nash  network  for  K\  =  0,  then  a  is  a  strict  Nash  network  K\  e  (/cx,  0). 

Proof: 

Part  1:  Since  «q  is  negative,  this  automatically  preserves  that  a  is  not  a  Nash  network. 

Part  2:  Suppose  that  a  is  a  non-strict  Nash  network  for  K\  =  0,  and  let  a[  satisfy  Rfa)  —  k0  \at\  = 
Ri(a')  —  Kq  \a'i\.  Then  a'  is  also  a  non-strict  Nash  network  for  /sq  =  0.  As  argued  in  the  proof  of 
Proposition  5.7,  \at\  =  \oi[\.  Therefore,  there  are  links  in  a[  not  in  a*.  For  K\  <  0,  a-  G  BR,;(q). 

Part  3:  If  a  is  a  strict  Nash  network  for  K\  =  0,  then  for  all  /'  G  I.  we  have  Rfa)  —  n0  \a,i\  > 
i?j(a',  cc_j)  —  |a'j  .  This  remains  a  strict  Nash  network  as  long  as 

Rfa)  —  k0  \a.i\  >  Rfoi'i,  a_j)  —  n0  |a'|  —  Ki'?/)(q:')t(1  — 

As  in  Proposition  5.7,  the  above  inequality  can  be  used  to  extract  a  lower  bound  on  «q  that  preserves 
strictness.  ■ 

In  other  words,  Proposition  5.8  states  that  by  decreasing  the  value  of  «q  from  zero  to  a  negative  value, 
we  can  make  the  strict  Nash  networks  of  the  case  «q  =  0  to  be  the  only  Nash  networks.  This  has  the 
opposite  effect  compared  with  Proposition  5.7.  In  fact,  and  as  we  will  also  explain  in  the  following 
section,  we  can  exclude  convergence  to  any  Nash  network  other  than  the  strict  Nash  networks  of  the  case 
/sq  =  0.  This  can  be  desirable  in  certain  cases.  For  example,  in  the  unconstrained  neighbors  case  (i.e., 
when  ffi  =  X  for  all  i  e  X),  and  when  n  \  =  0,  the  only  strict  Nash  equilibria  are  the  wheel  networks , 
where  each  node  has  exactly  one  link  (see,  e.g.,  Fig.  1  for  the  case  of  three  nodes).  In  this  case,  the  strict 
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Nash  networks  are  minimally  connected,  and  they  are  the  only  Nash  networks  for  small  negative  values 

Of  Ki. 

Both  Propositions  5.7-5. 8  reveal  the  potential  of  state -based  utility  functions  in  shaping  the  set  of  Nash 
equilibria  towards  ones  which  exhibit  more  desirable  properties. 

VI.  Learning  dynamics 

Thus  far,  the  “state”  in  the  state -based  utility  has  only  served  to  shape  the  set  of  strict  Nash  networks. 
In  this  section,  the  value  of  this  state  will  be  inherited  from  the  state  of  a  learning  process.  In  this  dynamic 
setting,  the  interpretation  of  the  state  reflecting  “familiarity”  will  be  apparent.  We  present  two  forms  of 
learning  dynamics.  The  first  is  “action-based”,  i.e.,  each  player  can  observe  the  actions  of  other  players. 
These  dynamics  will  resemble  a  state -based  variation  of  adaptive  play  defined  in  [33].  We  will  show  that 
these  dynamics  globally  converge  to  a  Nash  network.  The  second  form  of  learning  dynamics  is  based 
on  reinforcement  learning.  A  desirable  characteristic  is  that  these  dynamics  are  “payoff-based”.  Agents 
cannot  observe  the  overall  network.  Rather,  agents  only  measure  their  utility  received  from  the  network. 
We  will  show  that  these  dynamics  locally  converge  to  a  strict  Nash  network. 


A.  Adaptive  play 

The  “state”  in  the  state-based  utility  will  evolve  over  stages,  t  —  0, 1,2, ....  Let  M  >  1  be  an  integer, 
denoting  “memory  length”.  For  each  i  e  X,  define 

1  M— 1 

^(t+i)  =  —  (8) 

T— 0 

where  we  associate  each  action  aft)  as  a  vertex  of  A(Ai).  In  words,  xft  +  1)  is  the  empirical  frequency 
of  the  actions  of  agent  i  over  the  previous  M  stages. 

We  will  need  to  extend  the  definition  of  better  reply.  Define  the  set-valued  function: 

BR^ajXi)  = 

{a*  G  Ai  :  Vi((a*,a-i),Xi)  >  cm),  a*)}  . 


(In  the  previous  definition,  Xi  was  set  to  a*.) 

Let  p  G  (0, 1).  Actions  evolve  according  to  the  following  (non-deterministic)  rule: 


where 


O-iit) 


aft  -  1),  if  BR i(a(t  -  1  )',xft))  =  0; 
a: -(f),  otherwise, 


a[(t)  G 


aft  -  1), 

BR fa(t  -  1  );xi(t)), 


with  probability  p; 
with  probability  1  —  p. 


(9a) 


(9b) 


Proposition  6.1:  Assume  the  hypotheses  of  Proposition  5.1,  state  dynamics  (8),  and  action  selection 
rule  (9).  In  both  £  and  £  framework,  x(t)  converges  to  a  Nash  network  with  probability  one  for  any 
integer  M  >  1  and  initialization  a(r),  t  —  0, 1, ...,  M  —  1. 

Proof:  (sketch)  Consider  the  following  chain  of  events.  With  positive  probability,  all  agents  repeat 
their  actions  for  M  stages  prior  to  stage  T.  Then  x(T)  =  a(t  —  1).  At  this  stage,  if  x(T)  is  a  Nash 
network,  then  the  dynamics  have  converged.  Otherwise,  there  exists  a  single  agent  with  a  better  reply. 
Let  this  be  the  only  agent  that  updates  its  strategy,  while  all  others  repeat.  Now  let  all  agents  again  repeat 
their  actions  for  M  stages.  According  to  Claim  3.3,  this  process  can  repeat  until  the  state  converges  to  a 
Nash  equilibrium.  The  probability  of  such  a  chain  of  events,  say  £*  is  strictly  positive  (however  small). 
Therefore,  by  the  Borel-Cantelli  Lemma  (cf.,  Lemma  3.14  in  [34]),  the  process  eventually  converges  to  a 
Nash  network.  ■ 
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B.  State-based  reinforcement  learning 

Our  reinforcement  learning  scheme  assumes  that  at  each  stage  t  —  0, 1,2, each  agent  i  selects  an 
action  aft)  E  Ai  according  to  the  probability  distribution 

(1  -  A )xft)  +  1,  (10) 

where  (i)  xft)  E  A(|.A,;|)  is  the  strategy  of  agent  i  at  stage  t;  (ii)  1  is  a  vector  of  appropriate  dimension 
with  each  element  equal  to  1;  and  (iii)  A  >  0  is  a  parameter  used  to  model  possible  perturbations  in  the 
decision  making  process,  also  called  mutations  [33],  [35]. 

The  strategy  of  agent  i  is  updated  according  to  the  recursion: 

xft  +  1)  =  xft)  +  e(t)  ■  vfa(t),xft))  ■  (aft)  -  xft)).  (11) 

In  this  recursion,  the  j-th  entry  of  the  reinforcement  state,  xt],  can  naturally  capture  “familiarity”  weighting 
of  agent  i  with  action  j  E  At,  since  xl3  increases  if  action  j  is  selected  and  decreases  otherwise. 
Accordingly,  we  have  selected  xt  to  be  the  familiarity  state  in  the  reward  function  vt.  Such  selection 
also  simplifies  significantly  the  stability  analysis  of  the  recursion. 

Note  that  in  standard  reinforcement  learning,  e.g.,  the  models  of  [36],  [37],  [38],  the  reward  vt  is  a 
function  of  the  current  action  profile  a(t)  and  not  a  function  of  the  reinforcement  state  xft). 

We  will  generally  consider  the  step-size  sequence 

e(t)  =  1/r  +  l), 

where  v  E  (1/2, 1].  The  parameter  v  affects  the  rate  of  convergence.  It  is  straightforward  to  show  that 
for  sufficiently  large  t  the  vector  xt  ( • )  evolves  within  the  probability  simplex  which  is  sufficient  for  the 
stability  analysis  considered  here. 

The  convergence  properties  of  (11)  can  be  characterized  via  the  ODE  method  for  stochastic  approxima¬ 
tions  (cf.,  [39]).  Before  proceeding,  first  define  0  to  be  the  canonical  path  space  with  an  element  w  6  O 
being  a  sequence  (x(0),  a;(l), ...},  where  x(t)  =  (aq  (t ),...,  xn(t))  E  A  is  generated  by  the  process  and 
A  A  A(|*4.1|)  x  •  •  •  x  A(|^4n|)  .  Define  also  the  random  variable  yT  :  f)  — »  A  such  that  yT(cu)  =  x(r). 
In  several  cases,  we  will  abuse  notation  by  writing  x(r)  instead  of  yT(cu).  Let  also  F  be  a  a-algebra  of 
subsets  in  Q  and  P  a  probability  measure  on  (0,JF)  induced  by  the  recursion  (11).  The  a- algebra  F  will 
be  generated  appropriately  to  allow  computation  of  the  probabilities  of  interest.  Finally,  let  E  denote  the 
expectation  with  respect  to  measure  P.  Define 

9i(x(t))  —  E [vfa(t),xft))  ■  (aft)  -  xft)) \x(t)}, 

and  the  ODE 


x  =  g(x).  (12) 

where  g(-)  =  [gf-)\iex-  The  asymptotic  behavior  of  the  recursion  (11)  can  be  described  through  the 
invariant  sets  of  (12).  It  has  been  shown  by  Proposition  3.4  in  [25]  that  for  A  =  0,  any  pure  strategy 
profile  a*  =  (a\, ...,  a*  )  is  a  stationary  point  of  the  ODE  (12),  i.e.,  g(a*)  =  0.  The  sensitivity  of  stationary 
points  when  A  >  0  is  as  follows. 

Proposition  6.2  (Sensitivity  of  stationary  points):  For  any  pure  strategy  profile  a*  which  is  a  strict  Nash 
equilibrium,  and  for  sufficiently  small  A  >  0,  there  exists  a  unique  continuously  differentiable  function 

v*  :  M+  — »  Rl"4!,  such  that  lim^o  Z/*(A)  =  z/*(0)  =  0,  and 

x*  =  a*  +  v*(\)  E  Int(A)  (13) 

is  a  stationary  point  of  the  ODE  (12).  If  instead  a*  is  not  a  Nash  equilibrium,  then  there  exist  e  >  0  and 
An  >  0,  such  that  the  ^-neighborhood  of  a*  in  A,  Of  a*),  does  not  contain  any  stationary  point  of  the 
ODE  (12)  for  any  0  <  A  <  A0. 
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Proof:  The  proof  follows  similar  reasoning  with  Proposition  3.5  in  [25].  ■ 

Note  that  Proposition  6.2  does  not  discuss  the  sensitivity  of  stationary  points  which  are  non-strict  Nash 
equilibria.  However,  as  the  analysis  in  [25]  showed,  a  vertex  cannot  be  a  stationary  point  of  the  perturbed 
dynamics. 

Then,  the  behavior  of  the  recursion  (11)  nearby  stationary  points  is  described  by: 

Proposition  6.3  (Convergence  &  Nonconvergence ):  For  sufficiently  small  A  >  0,  let  x*  be  a  stationary 
point  of  the  ODE  (12)  corresponding  to  a  strict  Nash  equilibrium  a*  <E  A  according  to  (13).  When  the 
reinforcement  learning  scheme  (11)  is  applied,  Pflim^oo  x(t)  =  x*]  >  0.  If  instead  a*  is  not  a  Nash 
equilibrium,  then  there  exist  £  >  0  and  A0  >  0  such  that  Pjlim^oo  x(t)  G  Oe{a*)\  =  0  for  all  0  <  A  <  A0. 

Proof:  The  proof  of  the  first  statement  is  based  on  the  fact  that  any  stationary  point  x*  which 
corresponds  to  a  strict  Nash  equilibrium  (according  to  (13))  is  a  locally  asymptotically  stable  point  of 
the  ODE  (12).  This  can  be  shown  by  following  similar  reasoning  with  Proposition  3.6  in  [25].  Then,  by 
applying  Theorem  6.6.1  of  [39]  we  conclude  that  P[limt_>00x(i)  =  x*]  >  0  (see  also  Proposition  3.1  in 
[25]).  The  proof  of  the  second  statement  follows  from  Proposition  6.2  and  the  fact  that  the  vector  field  in 
the  vicinity  of  a*  points  towards  the  interior  of  A  for  any  small  A  >  0  (see  also  Proposition  3.7  in  [25]). 

■ 

Proposition  6.3  establishes  convergence  with  positive  probability  of  the  state-based  reinforcement  learn¬ 
ing  to  the  set  of  strict  Nash  equilibria  and  non-convergence  to  action  profiles  that  are  not  Nash  equilibria. 
Convergence  or  non-convergence  arguments  cannot  be  established  for  perturbations  of  non-strict  Nash 
equilibria.  However,  as  we  showed  in  Section  V-C,  the  “familiarity”  weights  can  be  utilized  to  shape 
appropriately  the  set  of  strict  Nash  equilibria  and  also  eliminate  the  set  of  non-strict  Nash  equilibria. 

Summarizing,  in  this  section  we  showed  that  (a)  reinforcement  learning  can  be  modified  to  incorporate 
“familiarity”  weights  in  the  utility  functions,  and  (b)  we  can  establish  convergence  with  positive  probability 
to  the  set  of  strict  Nash  equilibria. 


C.  Simulations 

In  this  section,  we  illustrate  the  utility  of  adaptive  play  and  state-based  reinforcement  learning  on 
network  formation  games.  To  this  end,  we  consider  the  following  two  examples:  (a)  n  =  16  nodes  are 
placed  on  the  vertices  of  a  rectangular  grid  as  shown  in  Fig.  3,  such  that  the  neighborhood  of  each  node 
consists  of  the  two  closest  nodes  along  the  horizontal  and  vertical  axis,  e.g.,  A 4  =  {2,  5,  7, 10};  (b)  n  —  6 
nodes  are  placed  on  a  circle  as  shown  in  Fig.  5,  such  that  the  neighborhood  of  each  node  consists  of  the 
two  closest  nodes  on  the  circle,  e.g.,  N\  =  (2,  6}. 

First,  let  us  consider  the  setup  of  example  (a)  where  nodes  are  placed  on  the  vertices  of  a  rectangular 
grid.  A  typical  response  of  adaptive  play  with  M  —  2  and  p  —  0.1  applied  in  the  connections  model  £  with 
kq  =  1/8  and  n\  =  0  is  shown  in  Fig.  3,  where  we  have  plotted  the  final  graph  and  the  running  average 
of  the  mean  distance  from  neighbors.  Note  that  a  critically  connected  network  is  formed  as  expected 
by  Proposition  5.4.  Furthermore,  the  distances  among  neighboring  nodes  vary  due  to  the  fact  that  the 
connections  model  £  does  not  impose  any  constraint  in  the  internode  distances. 

If,  instead,  the  limited  connections  model  £  is  applied  with  K  —  3,  «0  =  1/8,  K\  —  0  and  7  =  1/2, 
then  a  typical  response  is  shown  in  Fig.  4.  According  to  Proposition  5.2,  we  should  expect  that  adaptive 
play  converges  to  a  connected  network  such  that  the  internode  distance  between  any  two  neighboring 
nodes  is  no  larger  than  K .  Indeed,  as  we  observe  in  Fig.  4,  the  running  average  of  the  mean  distance 
from  neighbors  does  not  exceed  K  for  all  agents. 

To  demonstrate  the  utility  of  state -based  utility  functions  in  shaping  the  set  of  Nash  networks,  we 
consider  example  (b),  where  nodes  are  placed  on  a  circle.  Under  the  connections  model  €  and  the  assumed 
neighborhood  layout,  there  are  only  two  families  of  critically  connected  networks,  namely  the  star-like 
network  of  Fig.  5,  and  the  wheel  network  of  Fig.  6.  However,  the  wheel  networks  are  the  only  strict  Nash 
and  efficient  networks.  The  adaptive  play  and  reinforcement  learning  algorithms  introduced  here  are  likely 
to  converge  to  any  Nash  equilibrium  (star-like  or  wheel  network),  even  though  the  star-like  network  is  a 
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Fig.  3.  A  typical  response  of  adaptive  play,  with  M  =  2  and  p  =  0.1,  under  £  with  Ko  =  1/8  and  tci  =  0:  (a)  Final  graph,  (b)  Running 
average  of  mean  distance  from  neighbors  with  time. 


Fig.  4.  A  typical  response  of  adaptive  play,  with  M  =  2  and  p  =  0.1 
graph,  (b)  Running  average  of  mean  distance  from  neighbors  with  time. 


under  £  with  K  =  3,  kq  =  1/8,  Ki  =  0  and  7  =  1/2:  (a)  Final 


non-strict  Nash  network.  Fig.  5  shows  a  typical  response  of  adaptive  play  which  converges  to  the  star-like 
Nash  network  under  the  connections  model  £  with  k0  =  1/4  and  k  ,  =  0. 


(a)  (b) 

Fig.  5.  A  typical  response  of  adaptive  play,  with  M  =  2  and  p  =  0.1,  under  aco  =  1/4  and  m  =  0:  (a)  Final  graph,  (b)  Running  average 
of  mean  distance  from  neighbors  with  time. 


According  to  Proposition  5.8,  it  is  straightforward  to  show  that  in  the  connections  model  tl  with 
e  ( — Ho ,  0)  the  wheel  networks  will  be  the  only  strict  Nash  networks.  Furthermore,  any  other  (critically 
connected)  network  will  not  be  a  Nash  network.  Fig.  6  shows  a  typical  response  of  adaptive  play  in  the 
connections  model  €  when  k0  =  1/4  and  n \  =  —1/10,  where  convergence  to  a  wheel  network  is  observed. 
Under  the  same  framework,  Fig.  7  shows  a  typical  response  of  state-based  reinforcement  learning  (11) 
where  also  convergence  to  a  wheel  network  is  observed.  Thus,  we  showed  how  state -based  utility  functions 
can  be  utilized  to  exclude  convergence  from  non-efficient  Nash  networks. 
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Fig.  6.  A  typical  response  of  adaptive  play,  with  M  =  2  and  p  =  0.1,  under  £  with  Ko  =  1/4  and  tti  =  —1/10:  (a)  Final  graph,  (b) 
Running  average  of  mean  distance  from  neighbors  with  time. 


Fig.  7.  A  typical  response  of  state-based  reinforcement  learning  (11)  with  A  =  0.01  and  v  =  2/3,  under  £  with  Ko  =  1/4,  k\  =  —1/10: 
(a)  Final  graph,  (b)  Running  average  of  mean  distance  from  neighbors  with  time. 


VII.  Concluding  Remarks  and  Future  Work 

We  presented  a  strategic-form  game  formulation  for  the  problem  of  distributed  network  formation. 
Some  key  distinguishing  features  of  this  work  include:  (i)  directed  links  and  neighborhood  constraints,  (ii) 
distance-dependent  utility  functions  which  guarantee  existence  of  Nash  networks;  (iii)  state-based  utility 
functions  that  can  model  dynamic  phenomena,  such  as  establishment  costs,  and  can  shape  the  set  of  Nash 
networks;  and  (iv)  conditions  which  guarantee  existence  of  Nash  equilibria  for  the  state-based  extension 
of  weakly  acyclic  games.  Although  state -based  utility  functions  were  not  necessarily  associated  with  a 
specific  form  of  learning  dynamics,  we  showed  that,  when  combined  with  adaptive  play  or  reinforcement 
learning,  they  provide  an  equilibrium  selection  approach  in  network  formation  games.  For  example,  we 
showed  how  efficient  graphs  can  be  the  only  attractors  of  adaptive  play  and  reinforcement  learning  when 
a  negative  establishment  cost  is  considered.  The  proposed  reinforcement  learning  scheme  also  revealed 
the  potential  of  payoff-based  learning  approaches  (i.e.,  when  nodes  only  have  access  to  measurements  of 
their  utility)  for  equilibrium  selection  in  network  formation. 

A  few  directions  in  which  this  work  could  be  extended  include:  (a)  designing  alternative  utility  functions, 
(b)  reducing  communication  complexity,  and  (c)  designing  alternative  distributed  learning  processes.  In 
particular,  although  the  networks  emerging  through  the  proposed  scheme  exhibit  desirable  properties,  e.g., 
connectivity,  bounded-hop  diameter  and  small  number  of  links,  different  scenarios  may  require  alternative 
properties.  For  example,  minimal  number  of  links  may  not  be  desirable  due  to  issues  related  to  sensitivity  to 
failures.  Furthermore,  although  we  showed  analytically  that  the  proposed  reinforcement  learning  scheme 
converges  locally  to  the  strict  Nash  equilibria,  it  would  be  desirable  to  establish  global  convergence 
arguments,  which  is  currently  an  open  research  problem,  not  necessarily  restricted  to  network  formation 
games. 
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Appendix 

Proof  of  Proposition  5.1:  Part  1:  Suppose  a  network,  a,  is  not  a  Nash  equilibrium.  Suppose  further 
that  it  is  not  connected.  Then  there  exist  i,j  E  1  such  that  j  E  Ay  and  (j  — >  i)  j-  a.  (Recall  our 
assumption  that  connectivity  is  feasible  with  the  underlying  neighborhood  structure.)  In  the  <£  framework, 
setting  a[  =  ay  U  j  increases  the  utility  of  agent  i  by  1  —  (k0  +  «q)  >  0  without  decreasing  the  utility  of 
any  other  agent.  Furthermore,  since  >  0, 

Pi((a',  a_i),  a'f)  >  ty((a;',  ay)  >  vfa,  ay).  (14) 


Therefore,  a'  E  IS*  (a)  and,  if  we  define  the  coordination  function  0  :  A  — >  R  such  that 

</>(«)  =  >i(a,ai) 

iei 


(15) 


(i.e.,  0  is  the  value  of  the  graph),  then  off  ,  a_t)  >  0(ay,o;_j).  Therefore,  the  coordination  property  is 
satisfied. 

Now  suppose  that  a  is  not  a  Nash  equilibrium  but  is  connected,  and  let  a'  E  BRj(a).  We  can  assume 
that  agent  i  maintains  connectivity  to  all  of  A f.  Otherwise,  by  arguments  above,  we  can  replace  a'  with 
another  a"  G  BR,(a)  by  adding  links  that  maintain  connectivity.  As  a  result,  the  key  difference  between  the 
new  network  {af  a_*)  and  old  network  (ay,  a-f)  is  that  agent  i  maintained  connectivity  with  fewer  links. 
This  does  not  reduce  the  utility  of  other  agents.  Again,  since  >  0,  the  action  a'  satisfies  a[  G  IS* (a). 
Therefore,  the  coordination  property  is  satisfied  when  we  define  the  coordination  function  0  as  in  (15). 

Part  2:  In  moving  from  the  connections  model  (f  to  the  limited  connections  model  £,  simple  connectivity 
to  neighbors  is  insufficient.  Rather,  neighbors  must  be  within  a  radius  of  K  to  contribute  to  benefits.  Now 
suppose  that  a  network,  a,  is  not  a  Nash  equilibrium.  Furthermore,  assume  that  there  exist  i,  j  E  X  such 
that  j  G  Afi  and  distQ(j,  i)  >  K,  i.e.,  neighbor  j  is  outside  of  the  benefit  radius  K.  In  the  £  framework, 
setting  a  •  =  a*  U  j  changes  the  utility  of  agent  i  by 


dj{a ')  \ 

1  +  dfa') ) 


-  A(a)  1-7 


dj(a) 

1  +  difx) 


—  (^o  +  Kl)- 


Since  agent  i  added  a  link,  ffcf)  >  ff(a)  +  1  and  dfo')  <  dfo).  These  inequalities  imply  that  the 
change  in  utility  is  at  least 

f1  -  71  ~~ry  t)  -  («o  +  «i)  >  1  -  7  -  (« o  +  k i), 

which  is  positive  by  assumption.  Therefore,  a[  G  BRi(oi').  Furthermore,  since  >  0,  we  have  1—  7  —  kq  > 
0  and  therefore  the  condition  (14)  is  also  satisfied,  i.e.,  a'  G  ISj(a:).  Thus,  if  we  define  0  as  in  (15),  then 
the  game  satisfies  the  coordination  property. 

Now  suppose  that  a  is  not  a  Nash  equilibrium  but  satisfies  dista(j,  i )  <  K  for  all  i,j  such  that  j  E  A f. 
Let  a[  G  BR*(a).  Again,  we  can  assume  that  agent  i  maintains  connectivity  (within  radius  K)  to  all  of  Af%. 
However,  unlike  the  £  framework,  maintaining  this  connectivity  to  neighbors  does  not  imply  that  other 
nodes  have  maintained  connectivity  to  their  neighbors  within  a  radius  K. 

Let  us  decompose  the  two  actions  a*  and  a[  in  terms  of  links  that  were  (i)  kept,  (ii)  added,  and  (iii) 
dropped.  Specifically,  define  the  disjoint  sets 


-Akeep  ai  FI  OJj, 
A(add  at  \  A0eep , 

A(drop  =  OLj\  A^keep- 


(16a) 

(16b) 

(16c) 
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Each  of  these  sets  is  a  subset  of  Nt.  Since  a[  is  a  better  reply, 

m  fl  —  7-1  7—77^7:)  —  K0  |iVadd  U  A^keepl  —  Ki  j-ZYadd |  > 

M  i1  ~  71  +  dfa))  ~  K°  |iVdrop  U  iVkeepl  ' 

Since  a  started  with  no  deficient  agents,  di(a)  =  0,  and  so 

«o(|iVdr0p|  -  |iVadd|)  -  Kl  |iVadd|  >  (17) 

1  +  CLi\Ot ') 

The  left-hand- side  in  (17)  is  bounded  from  above  by  (|A/j|  — l)/c0.  Assume  that  the  network  a'  has  deficient 
agents  in  Bfa']  K).  Then  the  right-hand- side  of  (17)  is  bounded  from  below  by  \J\ft  \  / 2,  and  so 


«o  > 


w 

i^i-i 


2 

2' 


This  contradicts  the  assumed  condition  on  kq.  Accordingly,  a'  must  not  have  deficient  agents  in  Bfaf,  K ). 
Intuitively,  the  assumed  bound  on  kq  implies  that  an  agent  will  not  sacrifice  downstream  deficiency  just 
to  reduce  its  number  of  links.  Since  K\  >  0,  a[  €  IS* (a).  Also,  since  the  network  ol  has  no  deficient 
agents,  none  of  the  utilities  of  agents  other  than  1  has  been  reduced.  Therefore,  by  defining  0  as  in  (15), 
the  game  satisfies  the  coordination  property.  ■ 

Proof  of  Proposition  5.4:  (Critically  connected  =>■  Nash)  Let  a*  correspond  to  a  critically  connected 
network.  Suppose  for  some  agent  i  G  I  and  some  action  a't,  a[  f  a*, 


«-<)>«*)  >  M(ai 


<x- 


(18) 


i.e.,  agent  V s  utility  of  a[  is  greater  than  that  of  a*.  From  the  proof  of  Proposition  5.1,  we  can  assume 
that  a!  is  also  connected.  As  in  (16),  we  can  write  a*  =  Nkeep  U  ATdrop,  and  a[  =  A^keep  U  iVadd .  Clearly 
if  iVdr0p  =  0,  then  (18)  cannot  hold  since  a*  is  connected.  Assume  that  iVdr0p  f  0.  The  utility  of  agent 
i  in  case  of  a*  equals 

a-i)i  ai)  =  ( n  -  1)  -  «o(|Akeep|  +  |Adrop|)- 

In  case  of  a',  the  utility  of  i  is 

Vi((a-,  a*_f),  a*)  =  (n  -  1)  -  K0(|A^keeP|  +  |iVadd|). 


Thus, 


a*)  -  vf(a*,  a*_f,  a*)  = 

^o(|A^drop|  —  |-Vadd|)  >  0. 

The  only  possibility  for  (18)  to  hold  is  if  |iVdr0p|  >  |iVadd|. 

We  now  show  that  |iVdr0p|  >  |iVadd|  contradicts  a*  being  a  critically  connected  network. 

—  For  each  element  of  iVadd,  construct  a  path  in  a*  to  i.  These  paths  must  pass  through  iVkeep  U  iVdrop. 

—  Since  1 7Vdr0p |  >  iVadd  |,  there  exists  a  k*  e  Ndrop  that  is  not  part  of  any  of  these  paths. 

—  Construct  a  path  in  a'  from  k*  to  i.  This  path,  prior  to  reaching  i  must  pass  through  (iVkeep  U  iVadd). 
Prior  to  hitting  a  node  in  (iVkeep  U  iVadd),  this  path  lies  in  a*. 

—  The  conclusion  is  a  path  from  k*  to  an  element  of  iVadd  or  an  element  of  iVkeep-  In  either  case,  the 
path  can  be  continued  in  a*  to  i  without  passing  through  k*.  This  contradicts  the  critically  connected 
assumption  on  a*. 

As  a  result,  a*  cannot  be  a  Nash  equilibrium. 

(Nash  Critically  connected)  Suppose  a  Nash  network  is  not  critically  connected.  Then  there  exists 
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an  agent  i  that  can  drop  a  direct  link  to  an  agent  j  e  Aft  but  still  maintain  connectivity  to  j,  and  hence 
receive  the  benefits  of  j  without  incurring  the  maintenance  cost  of  j.  Therefore,  the  original  network 
cannot  be  a  Nash  network.  ■ 


